Much of this course is taught from a primarily Bayesian perspective
which provides a more principled and intuitive framework for
quantitative analysis and probabilistic reasoning. For much of what we
cover in terms of application, we’ll be using the R package brms which
provides a user-friendly and computationally efficient interface to
Stan’s implementation of the No-U-Turn Sampler, a Hamiltonian Markov
Chain Monte Carlo algorithm (Bürkner 2017, 2018; Carpenter et al.
2017; Hoffman and
Gelman 2014). After you have installed R, Rtools/Xcode, and
RStudio as detailed on the “Getting Started with R” page, this guide
will walk you through the process of installing Stan, brms, and
the necessary dependencies. You can download all of the code shown in
this document in the form of a script or copy and paste the code shown
here into your RStudio console.
To begin, I recommend setting some global options for the R session
as shown below and setting the MAKEFLAGS system variable to
enable multi-core compilation which will help speed up installation
time. Note, however, that if you restart your R session this will reset
the global options to their defaults so you may need to run this
particular code block more than once during the installation
process.
# Set Session Options
options(
digits = 6, # Significant figures output
scipen = 999, # Disable scientific notation
repos = getOption("repos")["CRAN"] # Install packagess from CRAN
)
# Set the makeflags to use multiple cores for faster compilation
Sys.setenv(
MAKEFLAGS = paste0(
"-j",
parallel::detectCores(logical = FALSE)
))
After setting the global options for the session, we’ll check if any
existing Stan packages are installed (unlikely, but it’s good to be
safe) and if so, remove them. After this is done, the next block checks
that required packages for subsequent steps in the installation process
are installed and installs them if they aren’t already present. In
passing, notice how the code below is wrapped in a pair of curly braces.
In R, this means that any code inside the {...} will be
evaluated at the same time rather than line-by-line and is useful for
calling if, if else, and else
statements outside of functions.
# Check if any existing Stan packages are installed
{
## Check for existing installations
stan_packages <- installed.packages()[
grepl("cmdstanr|rstan$|StanHeaders|brms$",
installed.packages()[, 1]), 1]
## Remove any existing Stan packages
if (length(stan_packages) > 0) {
remove.packages(c("StanHeaders", "rstan", "brms"))
}
## Delete any pre-existing RData file
if (file.exists(".RData")) {
file.remove(".RData")
}
}
# Check if packages necessary for later installation steps are installed
{
## Retrieve installed packages
pkgs <- installed.packages()[, 1]
## Check if rstudioapi is installed
if (isTRUE(all.equal(grep("rstudioapi", pkgs), integer(0)))) {
print("Installing the {rstudioapi} package")
install.packages("rstudioapi")
}
## Check if remotes is installed
if (isTRUE(all.equal(grep("remotes", pkgs), integer(0)))) {
print("Installing the {remotes} package")
install.packages("remotes")
}
## Else print a message
else {
print("{remotes} and {rstudioapi} packages are already installed")
}
}
If you are on a computer with a Windows operating system and you followed the instructions on the “Getting Started with R” page, it should not be necessary to manually configure the C++ toolchain. For OSX users, the above code should work as long as you are on a recent version of Catalina but if you run into errors during installation or subsequent compilation, you should consult the documentation for configuring the C++ toolchain on Macs and notify me of any issues as soon as possible so we can figure out how to get them resolved.
Once we’ve installed the necessary packages using the code in the
previous section, we can install the main R interface to Stan,
rstan along with the required headers for the Stan math
library. Since the StanHeaders package is a dependency of
rstan, installing rstan using the code below
will install both rstan and StanHeaders.
# Install the development versions of rstan and StanHeaders
install.packages(
pkgs = "rstan",
repos = c(
"https://mc-stan.org/r-packages/",
getOption("repos")
))
To check that the installation was successful and everything is working properly in the backend, you can execute the following code in R. Once you have verified everything runs without any errors, this is a good time to restart your R session before proceeding to the next step.
# This will fit a simple example model to check that the Stan compiler is working
example(stan_model, package = "rstan", run.dontrun = TRUE)
# You can either manually restart your R session via RStudio's GUI or run this code
rstudioapi::restartSession()
Next, we’ll proceed to installing the {brms} package. To
get the most recent development version we’ll use the
install_github function from the {remotes}
package as shown below.
# Install the latest development version of brms from github
remotes::install_github("paul-buerkner/brms")
If you are prompted to update existing R packages, type
1 in the console and press enter to proceed. An additional
window may appear asking if you would like to compile more recent
versions of some packages to be updated from source, in which case you
should choose “no” as doing so may cause the {brms}
installation to fail. If the package installs without any errors, you
can proceed to the next step.
The {brms} package provides the option to allow you to
use {cmdstanr}, a light-weight alternative to
{rstan}, as a backend instead of {rstan}. This
makes it possible to use latest version of the Stan math libraries and
cmdstan. Since {rstan} development tends to lag behind
Stan, this will often yield substantial performance gains by allowing
you to utilize the latest updates to the Stan language and can have the
added bonus of being more stable on certain operating systems.
First, we’ll start by installing the {cmdstanr} package
from github using the same approach we used to install
{brms} in the previous section.
# Install cmdstanr from github
remotes::install_github("stan-dev/cmdstanr")
Once we’ve successfully installed {cmdstanr}, we can use
the check_cmdstan_toolchain function with the
fix argument set to TRUE to check if the C++
toolchain needs to be configured further and if so, automatically apply
the correct configuration.
# Check that the C++ Toolchain is Configured
cmdstanr::check_cmdstan_toolchain(fix = TRUE)
The C++ toolchain required for CmdStan is setup properly!
After verifying the toolchain configuration is correct, we can run the following code to download and compile the latest release of cmdstan, which at the time of writing this tutorial is version 2.30.1.
# Install cmdstan version 2.30.1
cmdstanr::install_cmdstan(
cores = parallel::detectCores(logical = FALSE),
overwrite = TRUE,
version = "2.30.1", # Defaults to the latest version if not specified
cpp_options = list("STAN_THREADS" = TRUE),
check_toolchain = TRUE
)
If cmdstan compiles without any errors, you should be able to verify the installation and ensure the path directory has been correctly set by running the following code.
# Verify that cmdstan installed successfully
(cmdstan.version <- cmdstanr::cmdstan_version())
[1] "2.30.1"
# Ensure cmdstan path is set properly
cmdstanr::set_cmdstan_path(
path = paste(
Sys.getenv("HOME"),
"/.cmdstan/cmdstan-",
cmdstan.version,
sep = ""
))
CmdStan path set to: E:/Users/Documents/.cmdstan/cmdstan-2.30.1
As the output shows, cmdstan has been successfully installed to the
directory E:/Users/Documents/.cmdstan/cmdstan-2.30.1. The
final step in the installation process is to set the path environment
variable for the Intel TBB library which we can do by running the code
show below.
# Execute `mingw32-make install-tbb` in the terminal
rstudioapi::terminalExecute(
command = "mingw32-make install-tbb",
workingDir = cmdstanr::cmdstan_path()
)
# Reset the terminal
rstudioapi::terminalKill(id = rstudioapi::terminalList())
Note that for this change to take effect, you will need to close and reopen RStudio after executing the terminal command before proceeding to the next section.
Finally, to verify that the installation was successful and
everything works correctly, we can fit a simple linear model using
{brms} as shown below. For our purposes here, we’ll use the
built-in mtcars data set and model fuel efficiency
(mpg) as a linear function of weight (wt).
# Load the brms library
library(brms)
# Load the built-in mtcars data
data("mtcars")
## Fit the model
bayes_mpg_fit <- brm(
formula = mpg ~ wt, # Formula describing the model
family = gaussian(), # Linear regression
prior = prior(normal(0, 1), class = b), # Prior on the coefficients
data = mtcars, # Data for the model
cores = 4, # Number of cores to use for parallel chains
chains = 4, # Number of chains, should be at least 4
iter = 2000, # Total iterations = Warm-Up + Sampling
warmup = 1000, # Warm-Up Iterations
refresh = 0, # Disable printing progress
save_pars = save_pars(all = TRUE),
backend = "cmdstanr" # Requires cmdstanr and cmdstan be installed
)
Start sampling
Running MCMC with 4 parallel chains...
Chain 1 finished in 0.0 seconds.
Chain 2 finished in 0.0 seconds.
Chain 3 finished in 0.0 seconds.
Chain 4 finished in 0.0 seconds.
All 4 chains finished successfully.
Mean chain execution time: 0.0 seconds.
Total execution time: 0.3 seconds.
If everything was installed and configured successfully, the model
should run in about 0.3 seconds and you can obtain a summary of the
results using the summary function.
# Print a summary of the fitted model
summary(bayes_mpg_fit)
Family: gaussian
Links: mu = identity; sigma = identity
Formula: mpg ~ wt
Data: mtcars (Number of observations: 32)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 32.25 2.12 27.78 36.02 1.00 2579 2134
wt -3.79 0.63 -4.91 -2.48 1.00 2499 2123
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 3.52 0.55 2.64 4.79 1.00 2293 2489
Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).